AITopics | bert score

Collaborating Authors

bert score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fake News Detection After LLM Laundering: Measurement and Explanation

Das, Rupak Kumar, Dodge, Jonathan

arXiv.org Artificial IntelligenceJan-29-2025

With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4) We discover a worrisome trend for paraphrase quality measurement: samples that exhibit sentiment shift despite a high BERTSCORE. (5) We provide a pair of datasets augmenting existing datasets with paraphrase outputs and scores. The dataset is available on GitHub

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.18649

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation

Kang, Suho, Park, Jungyang, Ha, Joonseo, Kim, SoMin, Kim, JinHyeong, Park, Subeen, Song, Kyungwoo

arXiv.org Artificial IntelligenceDec-5-2024

Foundation models (FMs) have achieved significant success across various tasks, leading to research on benchmarks for reasoning abilities. However, there is a lack of studies on FMs performance in exceptional scenarios, which we define as out-of-distribution (OOD) reasoning tasks. This paper is the first to address these cases, developing a novel dataset for evaluation of FMs across multiple modalities, including graphic novels, calligraphy, news articles, and lyrics. It includes tasks for instance classification, character recognition, token prediction, and text generation. The paper also proposes prompt engineering techniques like Chain-of-Thought (CoT) and CoT+Few-Shot to enhance performance. Validation of FMs using various methods revealed improvements. The code repository is accessible at: https://github.com/MLAI-Yonsei/ExceptionalBenchmark

dataset, gemini-1, lyric, (14 more...)

arXiv.org Artificial Intelligence

2410.18001

Country: North America > Canada > Newfoundland and Labrador > Labrador (0.04)

Genre: Research Report (1.00)

Industry:

Media > Music (0.68)
Leisure & Entertainment (0.68)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation

Sharma, Divyam, Santhanam, Divya

arXiv.org Artificial IntelligenceOct-4-2024

Writing stories is an engaging yet challenging endeavor. Often, authors encounter moments of creative block, where the path forward in their narrative becomes obscured. This paper is designed to address such moments by providing an innovative solution: A tool that completes stories based on given prompts. By inputting a short story prompt, users can receive a conclusion to their story, articulated in one sentence or more, thereby enhancing the storytelling process with AI-driven creativity. This tool aims not only to assist authors in navigating writer's block but also to offer a fun and interactive way for anyone to expand on story ideas spontaneously. Through this paper, we explore the intersection of artificial intelligence and creative writing, pushing the boundaries of how stories can be crafted and concluded. To create our final text-generation models, we used a pre-trained GPT-3.5 model and a newly created finetuned SSM-Mamba model, both of which perform well on a comprehensive list of metrics including BERT score, METEOR, BLEU, ROUGE, and Perplexity. The SSM model has also been made public for the NLP community on HuggingFace models as an open source contribution, which for the timebeing is a first of its kind state-space model for story-generation task on HuggingFace.

gpt-3, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.10848

Country: North America > United States > Michigan (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

PatentGPT: A Large Language Model for Patent Drafting Using Knowledge-based Fine-tuning Method

Ren, Runtao, Ma, Jian

arXiv.org Artificial IntelligenceAug-26-2024

As humanity stands on the brink of a new era of technological innovation, the ability to rapidly transform creative ideas into protected intellectual property (IP) is more crucial than ever. However, the conventional processes for patent drafting are fraught with challenges, demanding a nuanced understanding of advanced field knowledge and technical concepts. Existing large language models (LLMs), while powerful, often fall short in this IP creation domain due to their lack of specialized knowledge and context-awareness necessary for generating technically accurate patent documents. To bridge this critical gap, we propose a groundbreaking framework for Knowledge Fine-Tuning (KFT) of LLMs, designed to endow AI with the ability to autonomously mine, understand, and apply domain-specific knowledge. Our model, PatentGPT leverages a unique combination of knowledge graph-based pre-training, domain-specific supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF). Through extensive evaluation, PatentGPT has demonstrated outstanding performance, scoring up to approximately 400% higher in patent related benchmark tests compared to state-of-the-art models. By KFT method the model's capability to not only assist but also augment human creativity and innovation, our approach sets a new standard for AI-driven intellectual property generation, paving the way for more efficient and effective invention processes.

knowledge, language model, patent, (15 more...)

arXiv.org Artificial Intelligence

2409.00092

Country:

North America > United States (0.47)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering

Li, Zongxia, Mondal, Ishani, Liang, Yijun, Nghiem, Huy, Boyd-Graber, Jordan Lee

arXiv.org Artificial IntelligenceJul-6-2024

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current efficient answer correctness (AC) metrics do not align with human judgments, particularly verbose, free-form answers from large language models (LLMs). There are two challenges: a lack of diverse evaluation data and that models are too big and non-transparent; LLM-based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing guidelines and datasets for evaluating machine QA adopted from human QA community. We also propose an efficient, low-resource, and interpretable QA evaluation method more stable than an exact match and neural methods.

dataset, evaluation, pedant, (16 more...)

arXiv.org Artificial Intelligence

2402.11161

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Leisure & Entertainment > Games > Jeopardy! (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending

Sanz-Guerrero, Mario, Arroyo, Javier

arXiv.org Artificial IntelligenceJan-29-2024

Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms. However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers. This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process. Our methodology involves processing these textual descriptions using a Large Language Model (LLM), a powerful tool capable of discerning patterns and semantics within the text. Transfer learning is applied to adapt the LLM to the specific task at hand. Our results derived from the analysis of the Lending Club dataset show that the risk score generated by BERT, a widely used LLM, significantly improves the performance of credit risk classifiers. However, the inherent opacity of LLM-based systems, coupled with uncertainties about potential biases, underscores critical considerations for regulatory frameworks and engenders trust-related concerns among end-users, opening new avenues for future research in the dynamic landscape of P2P lending and artificial intelligence.

bert score, language model, publication date, (12 more...)

arXiv.org Artificial Intelligence

2401.16458

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain > Galicia > Madrid (0.05)
South America > Chile (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services > e-Commerce Services (1.00)
Banking & Finance > Loans (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Comparing Visual Reasoning in Humans and AI

Murlidaran, Shravan, Wang, William Yang, Eckstein, Miguel P.

arXiv.org Artificial IntelligenceApr-29-2021

Recent advances in natural language processing and computer vision have led to AI models that interpret simple scenes at human levels. Yet, we do not have a complete understanding of how humans and AI models differ in their interpretation of more complex scenes. We created a dataset of complex scenes that contained human behaviors and social interactions. AI and humans had to describe the scenes with a sentence. We used a quantitative metric of similarity between scene descriptions of the AI/human and ground truth of five other human descriptions of each scene. Results show that the machine/human agreement scene descriptions are much lower than human/human agreement for our complex scenes. Using an experimental manipulation that occludes different spatial regions of the scenes, we assessed how machines and humans vary in utilizing regions of images to understand the scenes. Together, our results are a first step toward understanding how machines fall short of human visual reasoning with complex scenes depicting human behaviors.

arxiv, bertscore, oscar, (14 more...)

arXiv.org Artificial Intelligence

2104.14102

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Italy > Veneto > Venice (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.61)

Add feedback